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Conditional  Independence  in  Sample  Selection  Models 

Econometric  sample  selection  models  typically  use  a  linear  latent-index  with  constant  coefficients 
to  model  the  selection  process  and  the  conditional  mean  of  the  regression  error  in  the  selected  sample. 
A  feature  common  to  most  of  these  models  is  that  the  conditional  mean  function  for  regression  errors  is 
an  invertible  function  of  the  selection  propensity  score,  i.e.,  the  probability  of  selection  conditional  on 
covariates.  Consequently,  conditioning  on  the  selection  propensity  score  controls  selection  bias,  a  fact 
which  underlies  much  of  the  recent  literature  on  non-parametric  and  semi-parametric  selection  models. 
This  literature  has  not  addressed  the  question  of  whether  the  propensity-score  conditioning  property  is 
necessarily  a  feature  of  sample  selection  models.  In  this  note,  I  describe  the  conditional  independence 
properties  that  make  it  possible  to  use  the  selection  propensity  score  to  control  selection  bias  in  a  general 
sample  selection  model.  The  resulting  characterization  does  not  rely  on  a  latent  index  selection 
mechanism  and  imposes  no  structure  other  than  an  assumption  of  independence  between  the  regression 
error  term  and  the  regressors  in  random  samples.  This  approach  leads  to  a  simple  rule  that  can  be  used 
to  determine  if  conditioning  on  the  selection  propensity  score  is  sufficient  to  control  selection  bias. 
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1.  Introduction 

Sample  selection  problems  occur  in  both  experimental  and  observational  studies.  Econometricians' 
interest  in  sample  selection  models  originated  with  research  in  labor  economics  using  the  distribution  of 
wages  for  workers  to  infer  the  distribution  of  offered  wages  for  everyone  (Gronau,  1 974;  Heckman,  1 974). 
A  similar  problem  occurs  when  instrumental  variables  that  are  being  used  to  correct  for  endogeneity  in 
wage  equations  also  affect  employment  status  (see,  e.g.,  Angrist,  1995).  In  experiments,  the  bias  induced 
by  controlling  for  "post-treatment  variables"  that  are  correlated  with  the  outcome  of  interest  can  also  be 
interpreted  as  a  form  of  selection  bias  (Ham  and  Lalonde,  1996;  Rosenbaum,  1984). 

The  traditional  econometric  solution  to  problems  of  this  type  invokes  an  assumption  of  joint 
normality  for  the  regression  error  and  the  error  term  in  a  latent-index  selection  mechanism.  This  approach 
leads  to  a  maximum  likelihood  or  two-step  selection  correction  in  the  form  of  inverse  Mills-ratio  terms 
added  to  the  regression  function  of  interest  (see  Heckman,  1976  and  1979).  A  number  of  less  restrictive 
semi-parametric  variations  on  this  normal  selection  model  have  also  been  introduced  (see,  e.g.,  Ahn  and 
Powell,  1993,  and  references  in  Newey,  Powell,  and  Walker,  1990  and  Heckman,  1990).  Both  the 
parametric  and  semi-parametric  formulations  have  two  common  threads.  First,  a  latent  index  framework 
is  used  to  characterize  the  mean  of  unobserved  regression  error  terms  conditional  on  the  regression 
covariates  and  the  sample  selection  rule.  Second,  the  problem  of  selection  bias  is  either  implicitly  or 
explicitly  handled  by  conditioning  on  an  invertible  function  of  the  selection  propensity  score,  i.e.,  the 
probability  of  selection  given  covariates.1 

This  paper  describes  the  conditional  independence  properties  that  make  it  possible  to  use  the 
selection  propensity  score  to  control  selection  bias  in  regression  estimates.  The  problem  studied  here 
includes  applications  involving  experimental  data  and  instrumental  variables  as  special  cases.  My  first 
objective  is  a  nonparametric  formulation  of  the  sample  selection  problem  where  the  only  structure  consists 


'The  propensity-score  terminology  originates  with  Rosenbaum  and  Rubin  (1983,  1984),  who 
introduced  the  propensity-score  method  of  controlling  for  confounding  in  evaluation  research. 


of  the  regression  equation  of  interest  and  an  independence  restriction  on  the  regression  error.  The 
elementary  properties  of  conditional  independence,  as  discussed  by  Dawid  (1979)  and  Florens  and 
Mouchart  (1982),  are  then  applied  to  this  formulation.  I  believe  this  approach  provides  a  very  general 
description  of  the  identification  problem  in  sample  selection  models.  It  also  leads  to  an  easily  interpreted 
rule  that  can  be  used  to  check  when  nonparametric  conditioning  on  the  selection  propensity  score  is  indeed 
sufficient  to  control  selection  bias. 

2.  Motivation 

The  equation  of  interest  is 

y,  =  X,'P  +  e„  (1) 

where  y,  is  the  dependent  variable  and  X,  is  the  k  x  1  vector  of  regressors,  assumed  to  be  independent  of 
e,  in  the  population.  We  observe  a  random  sample  of  n  observations  on  X„  but  y,  is  not  observed  for 
everybody.  The  coefficient  vector  is  assumed  to  be  such  that  X/P  gives  the  population  conditional 
expectation  function,  so  that  if  we  did  observe  y,  for  everyone  in  the  sample,  P  could  be  consistently 
estimated  by  ordinary  least  squares  (OLS).  Although  independence  of  regressors  and  errors  is  not 
necessary  to  justify  this  definition  of  P,  the  full  independence  assumption  is  customary  in  papers  on 
selection  bias  (e.g.,  Chamberlain,  1986).  At  the  end  of  the  paper  I  show  that  full  independence  appears 
to  be  of  fundamental  importance  for  identification  strategies  that  use  the  selection  propensity  score  to 
control  selection  bias. 

Let  w,  indicate  selection  status  (i.e.,  w  =  l  indicates  workers  in  an  application  where  y-  is  the  log 
of  offered  wages).  The  selection  problem  is  often  motivated  using  a  latent  index  to  capture  the  possibility 
that  X,  affects  w,  as  well  as  yj.    In  particular,  suppose  that 

w,=  irX/6-T^X)]  (2) 

where  T|,  is  an  error  term  that  could  be  correlated  with  e,  but  is  assumed  to  be  independent  of  X,.   Then, 


we  have 

E[e,  |  X„  w,=l]  =  E[e,  |  X„  X,'5  >  tjj  *  0. 
Selection  bias  arises  because  the  term  E[e,  I  X,,  w=l]  is  generally  a  function  of  X,  even  though 
E[e,  |  X,]  =  0  in  random  samples. 

Parametric  solutions  to  selection  problems  of  this  type  typically  begin  by  assuming  that  the  error 
terms  (£„  ti,)  are  jointly  normally  distributed,  so  that 

E[e,|  X„w  =  l]  =  pE[v,|x„X,'8>TU  (3) 

where  p  is  the  coefficient  from  a  regression  of  e,  on  v;.  Because  of  normality  and  joint  independence  of 
(Ej,  T|,)  with  X,,  we  can  simplify  further: 

E[tli  I  X,'S  >  TiJ  =  -^(X,'5)/a)(X,'5)  =  XCXi'8), 
where  <j)  and  0  are  the  normal  density  and  distribution  functions  for  x\{.    Adding  A^X/8)  as  a  regressor 
to  (1)  provides  a  feasible  solution  to  the  selection  problem  in  this  context  because  the  required  parameters 
can  be  estimated  from  a  Probit  regression  of  w(  on  X(. 

Critics  of  traditional  econometric  selection  models  (e.g.,  Hartigan  and  Tukey,  1986;  Little,  1985) 
point  out  that  these  models  combine  a  wide  variety  of  assumptions  that  are  hard  to  assess  and  interpret. 
In  particular,  the  validity  of  parametric  corrections  for  sample  selection  bias  clearly  turns  to  a  large  extent 
on  distributional  assumptions,  functional  form  restrictions,  and  exclusion  restrictions  (i.e.,  zero  restrictions 
on  (3  and/or  8)  for  which  there  is  rarely,  if  ever,  any  empirical  justification.2  These  features  of  the  normal 
selection  model  have  led  econometricians  to  develop  less  restrictive  models  for  selection  problems.  For 
example,  Newey,  Powell,  and  Walker  (1990),  Lee  (1982),  and  others  have  shown  that  the  distributional 
assumptions  in  these  model  can  be  relaxed  in  a  semi-parametric  framework. 

One  approach  to  semi-parametric  selection  corrections  begins  with  the  observation  that  even  in 


"Mroz  (1987)  documents  the  fact  that  empirical  results  can  be  sensitive  to  these  assumptions.    See 
also  Olsen  (1980,  1982). 


the  absence  of  parametric  distributional  assumptions,  selection-correction  terms  typically  consist  of  non- 
linear but  invertible  functions  of  the  selection  propensity  score,  P[W;=1  |  ZJ.  This  feature  of  selection 
models  has  been  noted  by  Heckman  and  Robb  (1986),  and  it  motivates  the  semi-parametric  estimators  for 
selection  models  discussed  by  Newey  (1988),  Robinson  (1988),  Choi  (1992),  and  Ahn  and  Powell  (1993).3 
On  the  other  hand,  earlier  work  on  semi-parametric  selection  models  has  not  clarified  whether  the 
propensity-score  conditioning  property  is  a  necessary  feature  of  the  selection  problem.  In  the  next  section, 
I  use  simple  conditional  independence  notation  to  provide  a  general  characterization  of  models  for  which 
this  propensity-score  conditioning  property  holds. 

3.  Conditional  independence  and  sample  selection 

Conditional  independence  is  defined  as  follows: 

Definition  1  (Florens  and  Mouchart,  1982).  Let  A,,  A2,  A3  denote  random  variables  defined  on  a 
common  probability  space  with  joint  probability  measure  P[A,,  A2,  A3],  and  let  P[A,  |  A2,  A3]  denote  a 
conditional  probability  statement  for  A,  given  some  values  of  A2  and  A3.  Then  we  write  A,  JJ  A2 1  A3 
if  and  only  if  P[A,  |  A2,  A3]  =  P[A,  |  A3],  a.s. 

Note  that  conditional  independence,  like  independence,  is  transitive.  The  conditional  independence 
concept,  although  basic  and  perhaps  even  obvious,  has  proven  widely  useful.  Dawid  (1979)  popularized 
the  notation,  A,  JJ  A2 1  A3  ,  and  argued  that  many  important  ideas  in  statistics  can  best  be  understood  in 
these  terms.  Chamberlain  (1982)  and  Florens  and  Mouchart  (1982)  used  conditional  independence  ideas 
to  explore  the  relationship  between  alternate  definitions  of  "causality"  in  time-series  econometrics.  And 


Educational  researchers  and  statisticians  have  also  discussed  conditioning  on  the  selection 
propensity  score;  see,  for  example,  Powell  and  Steelman  (1984)  and  Wainer  (1986). 


Rosenbaum  and  Rubin  (1983)  used  conditional  independence  arguments  to  establish  the  propensity-score 
result  for  evaluation  problems  with  selection  on  observables.4 

The  properties  of  conditional  independence  can  be  characterized  as  follows: 

Lemma  1.  Let  R,,  R,,  R3,  and  R4  be  random  variables  defined  on  a  common  probability  space  with  joint 

probability  measure.  Then  the  following  are  equivalent: 

(i)  R,  U  R2  I  R3      and      R,  II  R-4  I  (R2>  R3) 

(ii)        R,  n  (R*  RJ  I  R3 

(iii)        R,  II  R-4  I   R3      and      R,  II  R2  I  (R3,  R4) 

This  is  Theorem  A.l  in  Florens  and  Mouchart  (1982),  presented  here  in  somewhat  simpler  notation,  and 
Lemma  4.3  in  Dawid  (1979).  Dawid  states  the  equivalence  of  (ii)  and  (iii)  only,  but  this  also  implies  the 
equivalence  of  (i)  and  (iii)  by  using  symmetry  of  R2  and  R4  in  part  (ii). 

3.1  Selection  bias 

The  problem  of  selection  bias  can  be  described  in  conditional  independence  terms  with  no 
structure  other  than  the  regression  of  interest  and  the  independence  restriction  on  £,.  This  restriction  is 
stated  formally  below: 

Assumption  1.  Let  {(y„  X,')';  i=l,  .  .  .,  n}  denote  i.i.d.  observations  on  a  (k+l)xl  random  vector.  Then 
e,  U  X,  where  8,  =  (y,-X,'(3)  and  E&J  X,]=X,'p\ 


4The  propensity-score  result  for  evaluation  research  says  that  in  an  experiment  where  treatment  is 
randomly  assigned  conditional  on  covariates.  it  is  enough  to  condition  on  the  probability  of  selection 
given  covariates  to  eliminate  confounding  from  any  relationship  between  the  covariates  and  the 
treatment  assignment. 


We  would  first  like  to  know  when  a  sample  selection  process  causes  Assumption  1  to  fail.   Let 
w,  indicate  selection  status  as  before.    Instead  of  observing  {(y„  X,')';  i=l,  .  .  .,  n},  we  observe 
{(w,y„  Wj,  X/)';  i=l,  .  .  .,  n}.    Important  consequences  of  the  selection  process  are  summarized  below: 

Proposition  1.  Suppose  that  X,  has  at  least  2  points  of  support  and  that  P[w,=  l  |  X,]  is  a  non-trivial 
function  of  X,.    Then  given  Assumption  1,    e,  JJ  X,  |  w,    iff  e,  JJ  w,  |  X,. 

Proof.  Let  Q.  denote  the  sample  space.  Setting  R,=e„  R2=X;,  R3=£2,  and  R4=w„  parts  (i)  and  (iii)  of  the 
lemma  imply  {£,  JJ  w,  |  X„  £,  JJ  X,}  iff  {£,  JJ  X,  |  w„  e,  U  w,}-  Since  e,  JJ  X,  is  maintained,  e,  JJ 
w,  |  X,  implies  E,  JJ  X,  I  w,  and  e,  JJ  w,  ,  while  e,  JJ  w,  |  X,  implies  either  e,  JJ  X,  |  w,  or  e, 
JJ  w,  or  both.  To  complete  the  proposition,  we  need  to  rule  out  e,  JJ  w,  with  e,  JJ  X,  |  w,  when  e, 
JJ  w,  |  X,.    Then    £,  JJ  w,  |  X,    always  implies    s,  JJ  X,   |  w,.   Note  that 

0  =  E[e,  |  XJ  =  E[E(e,  I  X„  w,)  I  XJ 
If  e,  JJ  w,   with   e,  JJ  X,  |  w,,  this  implies 

0  =  E[E(e,|  w,)|  XJ  =  E(e,|  w=0)  +  [E(e,  I  w,=  l)-E(e||  w,=0)]P[w,=l  |  XJ. 
But  this  cannot  be  true  when  P[w  =  l  |  XJ  takes  on  2  or  more  distinct  values  since  E(e,  I  wt)  is  not  zero 
and  E(e,  I  w  =  l)  is  not  equal  to  £(£,  |  w^O). 

The  property  £,  JJ  X,  |  w,  means  that  estimators  which  are  based  on  Assumption  1  will  be 
consistent  or  unbiased  in  the  selected  sample  as  well  as  in  a  random  sample.  The  property  w,  JJ  £,  I  X, 
means  that  selection  status  is  independent  of  the  regression  error  conditional  on  X,.  Thus,  any  selection 
mechanism  that  is  related  to  the  dependent  variable  conditional  on  the  regressors  causes  Assumption  1  to 
fail  in  the  selected  sample  if  selection  status  is  also  a  function  of  X,.  Conversely,  if  Assumption  1  fails 
in  the  selected  sample,  selection  status  must  be  related  to  the  dependent  variable.  Of  course,  the  first  point 
is  well-known  for  latent-index  models  with  homoscedastic  errors  (in  which  case,    w,  JJ  £,  I  X,   means 


that  Ej  and  the  latent-index  error  ri;  are  uncorrelated).  The  proposition  shows  that  this  and  the  converse 
are  general  features  of  sample  selection  problems.  In  particular,  these  properties  hold  regardless  of  the 
underlying  selection  mechanism  or  model. 

Proposition  1  applies  equally  to  experimental  data.  For  illustration,  assume  constant  treatment 
effects  where  y^y.o+P  and  yi0=u+e,  denote  the  counterfactual  outcomes  for  treated  and  nontreated 
individuals.  Let  X,  be  an  indicator  of  randomized  treatment  assignment.  By  virtue  of  random  assignment, 
a  simple  comparison  of  means  provides  an  unbiased  estimator  of  (3  in  random  samples  from  the  population 
at  risk  of  treatment.  But  the  independence  between  randomized  treatment  assignment  and  counterfactual 
outcomes  is  lost  whenever  the  experiment  is  analyzed  conditional  on  a  variable  that  is  both  correlated  with 
outcomes  and  affected  by  the  treatment  assignment.  Suppose  that  w;  is  a  second  outcome  variable  in  the 
experiment,  correlated  for  with  yj  for  some  reason  other  than  X,.  Examples  of  such  correlated  outcome 
pairs  include  death  and  health  status,  or  employment  status  and  wages.  Correlation  between  w;  and  e, 
given  X,  means  that  e,  ^  w;  I  Xj.  If,  as  seems  likely,  v/{  is  also  affected  by  X„  then  Proposition  1 
implies  that  estimation  of  (3  conditional  on  ws  is  necessarily  subject  to  selection  bias.5 

Finally,  note  that  Proposition  1  is  also  relevant  for  IV  estimation.  For  applications  involving 
instrumental  variables,  the  key  identifying  assumption  is  that  the  instruments  (playing  the  role  of  X,)  are 
independent  of  the  regression  error  term  (defined  around  an  endogenous  regressor)  in  the  target  population. 
The  implications  of  Proposition  1  for  IV  estimation  are  that  in  cases  where  the  sample  selection  rule 
involves  the  instruments,  the  instruments  will  be  independent  of  £,  in  the  selected  sample  if  and  only  if 
selection  status  is  independent  of  e,  conditional  on  the  instruments. 


5Rosenbaum  (1984)  makes  a  similar  point  regarding  the  consequences  of  conditioning  on  "post- 
treatment  variables"  in  experiments. 


3.2  The  selection  propensity  score 

The  selection  propensity  score  is  the  conditional  probability  of  selection  given  X,.  The  fact  that 
the  selection  propensity  score  is  a  function  of  X,  is  emphasized  by  the  notation 

e(X,)  =  P(w,  |  X,), 
which  is  used  in  the  definition  below: 

Definition  2.    The  selection  propensity  score  is  said  to  control  selection  bias  if  e,  JJ  X,  I  w„  e(X,). 

If  conditioning  on  the  selection  propensity  score  controls  selection  bias,  it  may  be  possible  to  estimate  (3 
in  the  selected  sample  because  then  we  have 

E[y,  I  X„  w,=  l]  =  X/p  +  E[e,  |  w,=  l,  e(X,)].  (4) 

Of  course,  for  this  to  be  of  any  practical  use,  that  must  be  enough  variability  in  X,  to  identify  (3  once  e(X,) 
is  fixed.  In  the  literature  on  semi-parametric  selection  models,  this  variation  is  obtained  by  imposing  a 
priori  exclusion  restrictions  on  (3  and/or  on  e(X,).  See  Chamberlain  (1986)  for  a  precise  statement  of  the 
restrictions  required  for  identification  in  this  context.  Here  I  focus  on  Definition  2  as  a  likely  starting 
point  for  any  possible  strategy  to  identify  p. 

The  relationship  between  Definition  2  and  other  the  conditional  independence  properties  of  sample 
selection  models  is  outlined  in  the  following  proposition: 

Proposition  2.    The  following  are  equivalent: 

(i)  w,  U  X,  |  e„  e(X,) 

(ii)         (e„  w,)  II  X,  |  e(X,) 

(iii)       e,  U  X,  |  w„  e(X.) 
Proof.    First  observe  that  w,  U  X,  |  e(X,)  because  P[w  =  l  |   X„  e(X,)]  =  e(X,)  and  that  £,  TJ  X,  |   e(X,) 

8 


because  e,  ]J  Xs  and  e(X,)  is  a  function  of  X,.    Moreover,    lemma  1  implies: 

{(i),  e,  H  X,  |  e(X,)}  »  (ii)  <=>  {(iii),  w,  U  X,  |  e(X,)}.  (6) 

Therefore,  since  w,  U  X,  |    e(X,)  and  e:  U  X,  I    e(X,)  hold  automatically  in  this  case,  the  result  is 
established.6 

Line  (iii)  of  Proposition  2  is  the  same  as  Definition  2,  i.e.,  this  part  of  the  proposition  consists  of 
the  statement  that  conditioning  on  the  selection  propensity  score  controls  selection  bias.  Lines  (i)  and  (ii) 
of  the  proposition  therefore  provide  equivalent  necessary  and  sufficient  conditions  for  Definition  2.  Note 
that  while  the  statements  w,  JJ  Xs  |  e(X,)  and  e;  JJ  X,  |  e(X,)  automatically  hold,  line  (ii)  makes  the 
stronger  statement  that  (e;,  w;)  are  jointly  independent  of  X,  given  e(X,).  Of  course,  this  joint 
independence  is  not  implied  by  marginal  independence.  Therefore,  one  consequence  of  the  if-and-only-if 
relationship  between  (ii)  and  (iii)  is  that  conditioning  on  the  selection  propensity  score  does  not  necessarily 
control  selection  bias. 

The  equivalence  of  lines  (i)  and  (iii)  also  has  practical  consequences  because  line  (i)  is  easy  to 
verify  in  the  context  of  specific  models  for  the  mechanism  determining  selection  status.  In  particular,  a 
weak  sufficient  condition  for  (i)  is  given  in  Proposition  3: 

Proposition  3.    Suppose  the  selection  mechanism  is  monotonic,  i.e.,  for  any  two  values  x',  x°  in  the 

support  of  X„  either 

P[w  =  l  |  x1,  £,]  >  P[w=l  |  x°,  £,],  a.s.    or 

P[w,=  l  j  x',  e,]  <  P[w,=  l  |  x°,  e,],  a.s.,  (7) 

where  conditioning  on  e,  is  what  makes  these  inequalities  random.   Then  conditioning  on  the  probability 

of  selection  controls  selection  bias. 


6Choi  (1992)  shows  that  part  (ii)  of  Proposition  2  [joint  independence],  imposed  on  a  latent-index 
error  and  the  regression  error  [i.e.,  replacing  w,  with  r|,  in  (2)],  implies  part  (iii). 


Proof.   Note  that  if  e(x')=e(x°), 

0  =  J  {P[w,=  l  |  x1,  e(x'),  e,]-P[w,=l  |  x°,  e(x'),  e,]}h(e,)de, 
Given  a  monotonic  selection  mechanism,  the  quantity  in  brackets  is  always  non-negative.  For  the  integral 
to  equal  zero,  we  must  therefore  have  P[w  =  l  |  x',  e(x'),  £,]=P[w  =  l  |  x°,  e(x'),  £,].    This  argument  shows 
that  for  a  monotonic  selection  mechanism, 

P[w  =  l  |  x1,  £,]=P[w,=  l  |  x°,  e,]  whenever  e(x')=e(x°).  (8) 

The  proposition  can  then  be  established  by  observing  that  (8)  implies  line  (i)  of  Proposition  2,  i.e.,  that 
the  probability  of  selection  is  conditionally  independent  of  X,  given  e(X,)  and  £,. 

To  get  an  idea  of  how  general  this  result  is,  note  that  any  latent-index  selection  mechanism  with 
constant  coefficients  and  errors  independent  of  X,  is  monotonic.  This  fact  does  not  depend  on  other 
assumptions  about  the  error  distribution.  Monotonicity  is  satisfied  by  a  wide  range  of  less  restrictive 
models  as  well.    Consider,  for  example,  a  latent  index  model  with  random  coefficients: 

w^irx/s.-Ti.x)], 

which  can  be  re-written, 

w,  =  lrXi'S'-Tii'x)], 

where  8*  =  E[5,]  and  r\*  =  r\l  +  X,(8*-8,)  is  an  error  term  that  is  not  independent  of  X,  even  if  T),  is 
independent  of  X,.    As  long  as  the  coefficient  8,  has  the  same  sign  for  all  i,  Proposition  3  implies  that 
conditioning  on  the  selection  propensity  score  controls  selection  bias  in  this  model. 

Finally,  it  is  worth  emphasizing  the  role  played  by  full  independence  in  obtaining  the  result  that 
conditioning  on  the  selection  propensity  score  controls  selection  bias.  Suppose  that  instead  of  (1)  we  have 
a  heteroscedastic  regression  model, 

y,  =  X,'P  +  [X,'y]e,  =  X,'(3  +  £,  (9) 

where  E,  is  still  assumed  independent  of  X;.  The  heteroscedastic  error  term,  £,,  is  only  mean-independent 
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of  X,.    In  this  case,  conditioning  on  the  probability  of  selection  does  not  control  selection  bias  because 

E[y,|  X„w,=l]  =  X1'(3  +  E[^|  w=l,  XJ 

=  X,'(3  +  [X/YMe,  |  w,=l,e(X,)].  (10) 

Equation  (10)  means  that  E[^,  |  w  =  l,  X,]  is  a  function  of  X,  and  not  just  e(X,).    In  other  words, 

SitfxJ  w„  e(X,). 
Even  if  E[e,  I    w  =  l,  e(X,)]  is  fixed  at  a  number  e,  estimates  in  the  selected  sample  are  biased  and 
converge  to  (3+ye.    The  identification  failure  occurs  in  this  case  in  spite  of  the  fact  that  the  selection 
mechanism  still  satisfies  part  (i)  of  Proposition  2,  i.e.,  w,  TJ  X,  |  ^,,  e(X,). 

4.  Conclusions 

This  paper  lays  out  some  of  the  basic  conditional  independence  properties  of  sample  selection 
models,  focusing  on  those  that  justify  conditioning  on  the  selection  propensity  score  to  control  selection 
bias.  One  reason  I  think  this  approach  to  sample  selection  models  is  valuable  is  that  it  leads  to  a  very 
general  formulation  of  key  identifying  assumptions.  Another  is  that  it  leads  to  a  weak  sufficient  condition 
for  the  validity  of  identification  strategies  that  either  explicitly  or  implicitly  involve  conditioning  on  the 
selection  propensity  score.  Finally,  the  results  given  here  are  cast  in  terms  of  potentially  observable 
quantities  (w,,  X„  and  £,),  and  not  inherently  unobservable  latent  index  error  terms.  This  means  that 
researchers  could  in  principle  construct  an  experiment  that  would  test  the  identifying  assumptions. 

An  additional  contribution  of  this  paper  may  be  to  help  bridge  the  gap  between  the  approaches 
taken  by  econometricians'  and  statisticians'  to  the  problem  of  selection  bias.  Use  of  the  propensity  score 
to  control  confounding  in  evaluation  research  was  introduced  by  Rosenbaum  and  Rubin  (1983)  and  many 
statisticians  are  familiar  with  this  idea.  In  a  discussion  of  econometric  models  for  program  evaluation, 
Holland  (1989,  page  876)  notes  that  econometricians  also  use  the  propensity  score,  but  he  suggests  that 
this  use  is  "quite  different  for  the  two  approaches."    Similarly,  Heckman  and  Robb  (1986)  argue  that  the 
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econometric  approach  to  selection  models  uses  the  propensity  score  "in  a  different  way  than  that  advocated 
by  Rosenbaum  and  Rubin  [1983]."  This  paper  shows  that  differences  between  the  role  played  by  the 
propensity  score  in  the  econometrics  and  statistics  literature  may  not  be  as  great  as  previously  believed. 
In  evaluation  research,  conditioning  on  the  propensity  score  controls  for  the  bias  caused  by  an  association 
between  treatment  status  and  observed  exogenous  covariates,  while  in  selection  models,  conditioning  on 
the  propensity  score  controls  for  the  bias  caused  by  an  association  between  selection  status  and  observed 
exogenous  covariates.  In  both  cases,  conditioning  on  the  propensity  score  solves  a  problem  of 
confounding  that  arises  because  of  dependence  between  a  dummy  endogenous  regressor  and  other 
covariates. 
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